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Abstract: A summary of the relationship between the Langevin equation, Fokker-Planck- 
Kolmogorov forward equation (FPKfe) and the Feynman path integral descriptions of stochas- 
tic processes relevant for the solution of the continuous-discrete filtering problem is provided 
in this paper. The practical utility of the path integral formula is demonstrated via some 
nontrivial examples. Specifically, it is shown that the simplest approximation of the path 
integral formula for the fundamental solution of the FPKfe can be applied to solve nonlin- 
ear continuous-discrete filtering problems quite accurately. The Dirac-Feynman path integral 
filtering algorithm is quite simple, and is suitable for real-time implementation. 



Keywords: Fokker-Planck Equation, Kolmogorov Equation, Universal Nonlinear Filtering, 



Path Integrals, path integral filtering . 



Contents 



|l|. Introduction ^ 
^. Review of Continuous-Discrete Filtering Theory § 



2.1 Langevin Equation and the FPKfe 



2.2 Fundamental Solution of the FPKfe 



2.3 Continuous-Discrete Filtering 



Path Integral Formulas 



3.1 Additive Noise 



|3.2| Multiplicative Noise 
Dirac-Feynman Path Integral Filtering 



4.1 Dirac-Feynman Approximation 



4.2 The Dirac-Feynman Algorithm ^ 



4.3 Practical Computational Strategies 



^. Examples 10 
|5l| Example 1 llC 



5J Example 2 |1S 

|. Additional Remarks H 

0. Conclusion 30 

^. Acknowledgements 31 

Summary of Path Integral Formulas 31 

|AT| Additive Noise |31 



A.2 Multiphcative Noise 32 



1. Introduction 

The following continuous-discrete filtering problem often arises in practice. The time evo- 
lution of the state, or signal of interest, is well-described by a continuous-time stochastic 
process. However, the state process is not directly observable, i.e., the state process is a 
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hidden continuous-time Markov process. Instead, what is measured is a related discrete- 
time stochastic process termed the measurement process. The continuous-discrete filtering 
problem is to estimate the state of the system, given the measurements [1] . 

When the state and measurement processes are linear, excellent performance is often 
obtained using the Kalman filter [2,3]. However, the Kalman filter merely propagates the 
conditional mean and covariance, so is not a universally optimal filter and is inadequate 
for problems with non-Gaussian characteristics (e.g., multi-modal). When the state and/or 
measurement processes are nonlinear, a (non- unique) linearization of the problem leads to an 
extended Kalman filter. If the nonlinearity is benign, it is still very effective. However, for the 
general case, it cannot provide a robust solution. 

The complete solution of the filtering problem is the conditional probability density func- 
tion of the state given the observations. It is complete in the Bayesian sense, i.e., it contains 
all the probabilistic information about the state process that is in the measurements and the 
initial condition. The solution is termed universal if the initial distribution can be arbitrary. 
From the conditional probability density, one can compute quantities optimal under various 
criteria. For instance, the conditional mean is the least mean-squares estimate. 

The solution of the continuous-discrete filtering problem requires the solution of a linear, 
parabolic, partial differential equation (PDE) termed the Fokker-Planck-Kolmogorov forward 
equation (FPKfe). There are three main techniques to solve the FPKfe type of equations, 
namely, finite difference methods [4, 5], spectral methods [6], and finite/spectral element 
methods [7]. However, numerical solution of PDEs is not straightforward. For example, the 
error in a nai've discretization may not vanish as the grid size is reduced, i.e., it may not be 
convergent. Another possibility is that the method may not be consistent, i.e., it may tend 
to a different PDE in the limit that the discretization spacing vanishes. Furthermore, the 
numerical method may be unstable, or there may be severe time step size restrictions. 

The fundamental solution of the FPKfe can be represented in terms of a Feynman path 
integral [8]. The path integral formula can be derived directly from the Langevin equation. 
A textbook discussion for derivation of the path integral representation of the fundamental 
solution of the FPKfe corresponding to the Langevin equations for additive and multiplicative 
noise cases can be found in [9] and [10]. These results have been simplified and generalized 
for the general case in [11] and [12]. In this paper, it is demonstrated that the simplest 
approximate path integral formulae lead to very accurate solution of the nonlinear continuous- 
discrete filtering problem. 

In Section ^, the basic concepts of continuous-discrete filtering theory is reviewed. In 
Section |3|, the path integral formulae for the case of additive and multiplicative noise cases 
are summarized. In Section ^, an elementary solution of the continuous-discrete filtering 
problem is presented that is based on the path integral formulae. Some examples illustrating 
the path integral filtering is presented in the following section. Some remarks on practical 
implement ational aspects of path integral filtering is presented in Section The appendix 
summarizes the path integral results derived in [11] and [12]. 
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2. Review of Continuous-Discrete Filtering Theory 

2.1 Langevin Equation and the FPKfe 

The general continuous-time state model is described by the following stochastic differential 
equation (SDE): 

dx{t) = f{x{t),t)dt + e{x{t),t)d\/{t), x{to) = xq. (2.1) 

Here x(t) and f(x(t),t) are n— dimensional column vectors, the diffusion vielbein e(x(t), t) is an 
nxpe matrix and v{t) is a pedimensional column vector. The noise process v(t) is assumed to 
be Brownian with covariance Q{t) and the quantity g = eQe^ is termed the diffusion matrix. 
All functions are assumed to be sufficiently smooth. Equation |2.1| is also referred to as the 
Langevin equation. It is interpreted in the Ito sense (see Appendix ^). Throughout the 
paper, bold symbols refer to the stochastic processes while the corresponding plain symbol 
refers to a sample of the process. 

Let (To(x) be the initial probability distribution of the state process. Then, the evolution 
of the probability distribution of the state process described the Langevin equation, p{t,x), 
is described by the FPKfe, i.e., 

'dp "5 1 " 9^ 

^ ^) = - E 5^ [/^(^' *)^'(*' -)] + 2 E Mt, x)pit, X)] , ^^^^^ 

p{to,x) = ao{x). 

2.2 Fundamental Solution of the FPKfe 

The solution of the FPKfe can be written as an integral equation. To see this, first note 
that the complete information is in the transition probability density which also satisfies the 
FPKfe except with a 5— function initial condition. Specifically, let t" > t', and consider the 
following PDF: 




^ P{t',x"\t',x') = 6''ix" -x'). 

(2.3) 

Such a solution, i.e., P{t,x\t' ,x'), is also known as the fundamental solution of the FPKfe. 
From the fundamental solution one can compute the probability at a later time for an arbitrary 
initial condition as follows: 

p{t", x") = J P{t", x"\t', x')p{t', x') . (2.4) 

In this paper, all integrals are assumed to be from — oo to +oo, unless otherwise specified. 
Therefore, in order to solve the FPKfe it is sufficient to solve for the transition probability den- 
sity P{t,x\t' ,x'). Note that this solution is universal in the sense that the initial distribution 
can be arbitrary. 



- 3 - 



2.3 Continuous-Discrete Filtering 

In this paper, it is assumed that the measurement model is described by the following discrete- 
time stochastic process 

y{tk) = h{x(tk),\M{tk),tk), A; = 1,2,..., tk>to, (2.5) 

where y{t) £ W^^^,h £ M™^^, and the noise process w(t) is assumed to be a white noise 
process. Note that y(to) = 0. It is assumed that p{y{tk)\x{tk)) is known. 

Then, the universal continuous-discrete filtering problem can be solved as follows. Let 
the initial distribution be cro(x) and let the measurements be collected at time instants 
ti,t2, ■ ■ ■ ,tk, ■ ■ ■■ Let p(tfc_i|y(tfc_i)) be the conditional probability density at time tk-i, 
where Y{t) = {y{ti) : < ti < t}. Then the conditional probability density at time tk, after 
incorporating the measurement y{tk), is obtained via the prediction and correction steps: 

p{tk,x\Y{tk-i)) = j P{tk,x\tk-i,x^-i)p{t^^i,x^_i\Y{t^_i)) {d"'Xk^i] , (Prediction Step), 

(2.6) 

Often (as in this paper), the measurement model is described by an additive Gaussian noise 
model, i.e., 

y(tfc) = /i(x(tfc),tfc) +w(tfc), fc = l,2,..., tk>to, (2.7) 
with w(t) ~ iV(0,i?(t)), i.e., 

p{y{tk)\x) = - — ^ ^- — -—772 exp I -\{y{tk) - h{x{tk),tk)Y'{R{tk))~^{y{tk) - h{x{tk),tk)) 

{{2tt)"^ det R{tk)r ^ ^ 

(2.8) 

Observe that, as in the PDE formulations, one may use a convenient set of basis functions. 
Then, the evolution of each of the basis functions under the FPKfe follows from Equation 
Since the basis functions are independent of measurements, the computation may be 
performed off-line. Finally, note that this solution of the filtering problem is universal. In 
conclusion, the determination of the fundamental solution of the FPKfe is equivalent to 
the solution of the universal optimal nonlinear filtering problem. A solution for the time 
independent case with orthogonal diffusion matrix in terms of ordinary integrals was presented 
in [13]. However, the integrand is complicated and not easily implementable in practice. In 
the next section, the fundamental solution for the general case in terms of path integrals is 
summarized. It is shown that it leads to formulae that are simple to implement. 

3. Path Integral Formulas 

In this section, path integral formulae derived in [11] and [12] are summarized. It is assumed 
that t" > t'. Details on the formulae are summarized in Appendix 
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3.1 Additive Noise 

When the diffusion vielbein is independent of the state, i.e., 

dx{t) = f{x{t),t)dt + e{t)dv{t), (3.1) 

where all quantities are as defined in Section II-C, the noise is said to be additive. The path 
integral formula for the transition probability density is given by 

(•x{t")=x" ( ft" 

lx{t')=x' 

where the Lagrangian L^^\t, x, x) is defined as 



P{t",x"\t',x') = [^x{t)]exp(- dtLM(t,x,x) , (3.2) 

Jx(t')=x' \ Jt' I 



-.72 ^ Pi M 

L^'\t, X, i) = 2 E (^^ - /^(^^^'^ 9i-{t) (x, - /,(x«(t), t)) + r ^ ^(x, t), (3.3) 



i=l 1=1 



and 



9^jit) = J2 e^a{t)Qab{t)ebj{t), (3.4) 



a,b=l 



and 



[^x{t)] = ^ ' hm n , dMt' + ke) 

^(2^e)" det git') J-^A det g{t' + ke) 

Here, r G [0, 1] specifies the discretization of the SDE; see Appendix H for details. The 
quantity S{t" ,x') = j^, L^^'(t^x,x)dt is referred to as the action. 

3.2 Multiplicative Noise 

The state model for the general case is given by 

dx{t) = f{x{t),t)dt + e{x{t),t)d\i{t). (3.6) 

As discussed in more detail in Appendix there is ambiguity in the definition of this SDE 
which is due to the fact that d\/{t) ~ 0(\/dt). The path integral formula for the general 
discretization is complicated and summarized in Appendix In the simplest Ito case, it 
reduces to 

,.x(t")=x" ( ft" \ 

P{t",x"\t',x') = [^x(t)]exp - / dtL^^'^\t,x,±)) , (3.7) 



lx{t')=x' \ Jt' 

where the Lagrangian , X, x) is defined as 

L(^'0)(t,x,i;) = if;(x,-/,(xW(t),t))5,^i(x(0)(t),t) (i, - /,(xM(t), t)) +rf;M(x«,t), 

i=l i=l 

(3.8) 
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and 

9ij{x^'\t),t) = £ eia{x^°\t),t)Qaj,{t)ebj{x^''\t),t). (3.9) 

a,b=l 

A nice feature of the Ito interpretation is that the formula is the same as that for the simpler 
additive noise case (with some obvious changes). 

Note that it is always possible to convert from a SDE defined in any sense (say, Stratanovich 
or s = 0) to the corresponding Ito SDE. Therefore, this can be considered to be the result for 
the general case. 



4. Dirac-Feynman Path Integral Filtering 

The path integral is formally defined as the — > oo limit of a A multi-dimensional integrals 
and yields the correct answer for arbitrary time step size. In this section, an algorithm for 
continuous-discrete filtering using the simplest approximation to the path integral formula, 
termed the Dirac-Feynman approximation, is derived. 

4.1 Dirac-Feynman Approximation 

Consider first the additive noise case. When the time step e = t" — t' is infinitesimal, the path 
integral is given by 

P{t' + €,x"\t',x') = 
where the Lagrangian is 



1 



^{27re)^detg{t') 



exp 



-eL('^)(i,x',x", {x" - x')/e)\ , (4.1) 



E 



x'! - x\ 



-f,{x' + r{x"-x'),t) 



i,j=l 



x'' - x' 



^ ^ - fj{x' + r{x" -x'),t) 



+ rj^^{x' + r{x"-x'lt). 



i=l 



This leads to a natural approximation for the path integral for small time steps: 



P{t" , x"\t' , x'^ 



1 



: exp 



A special case is the one-step pre-point approximate formula 

1 



P{t",x"\t',x''' 



it" - t') A 



{t"-t') 



(4.2) 



eL^''\t,x',{x" -x')/it" -t')) . (4.3) 



(4.4) 



_ f.(x' t') 
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The one-step symmetric approximate path integral formula for the transition probability 
amplitude (as originally used by Feynman in quantum mechanics [8]) is 

1 



P{t",x"\t',x') = 



X exp — 



{t" - 1' 



E 



{t"-t') 



fi{x,t) 



(4 - 4) 
L it"-t') 



fj{x,t) 



(4.5) 



i=l 



where x = ^{x" + x') and t = ^{f + t"). Note that for the explicit time-dependent case 
the time has also been symmetrized in the hope that it will give a more accurate result. Of 
course, for small time steps and if the time dependence is benign, the error in using this or 
the end points is small. 

Similarly, for the multiplicative noise case in the Ito interpretation/discretization of the 
state SDE the following approximate formula results: 

1 



P{t\x"\t',x'' 



exp 



where the Lagrangian L^'^'^^{t,x' ,x" , {x" — x')/(t" — t')) is given by 



(t" -t')L^^^^\t,x',{x" ~x')/{t" -t')) 



{x" - 


x') 








<) 


it"- 


t') 



(4.6) 



(4.7) 



I E (j!ri^-f^ix' + r{x''-x'),t')\ I f2 e,a{x',t')Qa,{t')ej,ix',t')\ 

i,j=l ) ^ \a,b=l J 



-1 



For the multiplicative noise case, the simplest one-step approximation is the pre-point dis- 
cretization where r = s = 0: 

1 



P{t",x"\t',x" 



y/{27r{t" -t'))'^detg{x',t') 
X exp 



(4.8) 



<*"-'''^'itf-/.(^'.o).-(.'..')(<|^-«.'.') 



{t"-t') 

Since s = 0, this means that we are using the Ito interpretation of the state model Langevin 
equation. When r = 1/2, it is termed the Feynman convention, while s = 1/2 corresponds to 
the Stratanovich interpretation. 

4.2 The Dirac-Feynman Algorithm 

The one-step formulae discussed in the previous section lead to the simplest path integral 
filtering algorithm, termed the Dirac-Feynman (DF) algorithm. The steps for DF algorithm 
may be summarized as follows: 
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1. 



From the state model, obtain the expression for the Lagrangian. Specifically, 



• For the additive noise case, the Lagrangian is given by Equation |3.3| ; 

• For the multiplicative noise case with Ito discretization the Lagrangian is given by 
Equation |3.8| , while for the general discretization the action is given in Appendix 

E 

Determine a one-step discretized Lagrangian that depends on r G [0, 1] (and s £ [0, 1] 
for the multiplicative noise). The usual choice is r = 1/2. 

Compute the transition probability density P{t" ,x"\t' ,x') using the appropriate for- 



mula (e.g.. Equations 4.5 or 4.8). The grid spacing should be such that the transition 



probability tensor is adequately sampled, as discussed below. 
At time 

(a) The prediction step is accomplished by 

pitk\Y{tk-i)) = [ P{tk, x\tk-i,x')p{tk^i,x'\Yitk-i)) . (4.9) 



Note that p{to\Y{tQ)) is simply the initial condition p{to,XQ) = ao{xo). 
(b) The measurement at time are incorporated in the correction step via 

p{y{tk)\x)p{tk,x\Y{tk-i)) . . 

p[tk,x\Y [tk)) - . , , ... , . , Tvl" ?!• (4-lU) 

J P{y{ik)\Op{ik,i\Y{tk-i)) 

4.3 Practical Computational Strategies 

The above general filtering algorithm based on the Dirac-Feynman approximation of the 
path integral formula computes the conditional probability density at grid points. This can 
be computationally very expensive as the number of grid points can be very large, especially 
for larger dimensions. Here, a few approximations will be presented that drastically reduces 
the computational load. 

The most crucial property that is exploited is that the transition probability density is an 
exponential function. Consequently, many elements of the transitional probabilty tensor are 
negligible small, the precise number depending on the grid spacing. A significant computa- 
tional saving is obtained when the (user-defined) "exponentially small" quantities are simply 
set to zero. In that case, the transition probability density is approximated by a sparse tensor, 
which results in huge savings in memory and computational load. 

The next key issue is that of grid spacing. An appropriate grid spacing is one that 
adequately samples the conditional probability density. Of course, the conditional probability 
density is not known, but its effective domain (i.e., where it is significant) is clearly a function 
of the signal and measurement model drifts and noises. For instance, the grid spacing should 
be of the order of change in state expected in a time step, which is not always easy to 
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determine for a generic model. However, if the measurement noise is small, finer grid spacing 
is required so as to capture the state information in precise measurements. However, if the 
measurement noise is large, it may be unnecessary to use a fine grid spacing even if the state 
model noise is very small since the measurements are not that informative. Alternatively, If 
the grid spacing is too large compared to the signal model noise viclbcin term, replace the 
difusion matrix with an "effective diffusion matrix" that is taken to be a constant times the 
grid spacing, i.e., noise inflation. This additional approximation can still lead to useful results 
as shown in an example below. 

It is also noted that the grid spacing is a function of the time steps. This is analogous to 
the case of PDE solution via discretization. Thus, when using the one-step DF approximation, 
there will not be a gain by reducing the grid spacing to smaller values (and at the cost 
of drastically increasing processing time). It is then more appropriate to use multi-step 
approximations to get more accurate results. 

Here are some possibilities for practical implemenetation: 

1. Precompute the transition probability tensor wth pre-determined and fixed grid. There 
are two options: 

(a) Compute the correction at all the grid points; 

(b) Compute the correction only where the prediction result is significant. 
It is the second of those options that will be used in this paper. 

2. Another option is to use a focussed adaptive grid, much as in PDE approaches. Specif- 
ically, at each time step: 

(a) Find where the prediction step result is significant; 

(b) Find the domain in the state space where the conditional probability density is 
significant, and possibly interpolate. For the multi-modal case, there would be 
several disjoint regions; 

(c) Compute the transition probability tensor with those points as initial points and 
propagate to points in region suggested by state model. 

Thus, the grid is moving. In this case, the grid can be finer than in the previous case, 
although then the computational advantage of pre-computation is lost. 

3. Precompute the solution using basis functions. For instance, in many applications 

wavelets have been known to represent a wide variety of functions arising in practice. 
Then, instead of using the transition probability tensor, FPKfe solutions with wavelet 
basis functions are stored. 
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5. Examples 



In this section, a couple of two-dimensional examples are presented that illustrate the utility 
of the path integral formulae presented in this paper. The signal and measurement models are 
both nonlinear in these examples. Therefore, the Kalman filter is not a reliable solution for 
these problems. The symmetric discretization formula was used. The MATLAB tensor tool- 
box developed by Bader and Kolda was used for the computations [14]. The approximation 



techniques are discussed in Section 4.3. In addition, in order to speed up the pre-computation 
of the transition probability tensor, it was assumed that -P(/i, /2|^i, ^2) = if \ fr — ir\ > 2, 
i.e., the "extent" of P was chosen to be 2. Thus, this implementation of the DF algorithm is 
sub-optimal in many ways. 

For comparison, the performance of the SIR particle filter based on the Euler discretiza- 
tion of the state model SDE is also included [15]. The MATLAB toolbox PFlib was used in 
the particle filter simulations [16]. 



5.1 Example 1 



A signal process sample (o = 0.001, o =0.03) 



X U - 




Figure 1: A Sample trajectory for the state model in Equation 5.1 



A measurement process sample {a^= 0.2) 




Time 



Figure 2: A measurement sample for the state trajectory in Figure |l| and measurement model given 
by Equation p. 4 



Consider the state model 



dxi{t) = (-189x^(t) + 9. 16x2 (t)) dt + a^^dviit), 



1 



with the nonlinear measurement model 



-dt + fTx2dV2(t), 



sm 



X2(ifc) 



Here 



0.001 0.03 



(5.1) 



(5.2) 



and we consider two values for dy, namely ay = 0.2 and Uy = 2. 
This example was studied in [17] and the extended Kalman is known to fail for this model. 
The Lagrangian for this model is easily seen to be given by 

1 1 / 1 \ ^ 

—^{xi{t) + 189xUt)-9.16x2{t)y + ^i^X2{t) + -j . (5.3) 
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Conditional Mean with 2o bounds for the sample path (measurement noise 0^=0.2): Nx1 =42Nx2=42. 




0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

Figure 3: Conditional mean Xi(t)) computed for the measurement sample of Figure |l|. 



Consider first the cjy = 0.2 case. Figure |l] shows a sample trajectory with initial state 
[0.37,0.31], and in Figure |2| is shown a measurement sample. The time step size is 0.01 and 
the number of time steps is 200. 

The spatial interval [—0.8, 0.8] x [—0.8, 0.8] is subdivided into 42 x 42 equal intervals. 
The signal model noise is very small requiring much finer grid spacing. Instead, as discussed 

Axi Ax2 



with a = 1. The initial 



in Section |4.3| , the effective cr's were taken to be a x 
distribution is taken to be uniform. 

The conditional mean for xi(t) and X2(t) are plotted in Figures ^ and ^. The RMS error 
was found to be 0.1180 and the time taken was 40 seconds. Observe that the conditional 
mean is within a standard deviation of the true state almost all of the time confirming that 
the tracking quality is good. It is noted that the variance is larger for xi(t). The reason can 
be understood from Figure ^ which plots the marginal conditional probability density of the 
state variable xi(t) — it is bi-modal. The bi-modal nature (for a significant fraction of the 
time) is the reason the EKF will fail in this instance. The performance is seen to be similar 
to that reported in [17] which was obtained using considerably more involved techniques and 
finer grid spacing. 

This example was also investigated using the SIR-PF. The SIR-PF implemented with 
5000 particles took about 155 seconds and the RMS error was found to be 0.165 even when 
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Conditional Mean with 2o bounds for the sample path (measurement noise o =0.2): Nx1 =42Nx2=42. 



0.5 




-0.4 - 



_Q 5 I I I I I I I I I I I 

■ 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

Figure 4: Conditional mean X2(t)) computed for the measurement sample of Figure |l|. 

initiated about the true state, i.e., initial distribution was chosen to be Gaussian with mean 
[0.37,0.31] and variance I2, where I2 is the 2x2 identity matrix. When the variance was 
reduced to 10^^ x /, it resulted in RMS error of only 0.022. Thus, the performance of the 
SIR-PF depends crucially on the initial condition. It is also noted that no bi-modality of 
the marginal pdf of state xi(t) at T = 1 was observed for the SIR-PF simulations when 
the number of particles was 5000. Upon increasing the number of particles to 10, 000, the 
bi-modality was noted, although the RMSE was not significantly smaller. 

Next consider the larger measurement noise case. A measurement sample corresponding 
to the state trajectory in Figure Q is shown in Figure |6[ The spatial interval [—1.6, 1.6] x [—1, 1] 
was subdivided into 62 x 62 equispaced grid points. The RMS error was found to be 0.128 
and the time taken was about 110 seconds. The bimodality at T = 1 is evident in this case 
as well (see Figure |^ . 

The SIR-PF was also implemented. When initialized as Gaussian with zero mean and 
unit variance, the tracking performance of the SIR-PF failed; the RMSE was found to be 
25.34 when using 5000 particles (taking about 110 seconds). A sample performance is shown 
in Figures |8| and ^; it is clear that the state xi is poorly tracked. Even when using 50,000 
particles, a sample run that took 25 minutes, resulted in RMS error of 16.53. 
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Marginal conditional probability distribution of at time T=1 . 
0.25 1 1 , 1 , , , 



0.2 - 



0.05 - 








-0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 

Figure 5: Marginal conditional probability density for xi{t). 
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A measurement process sample (0^= 2) 

8 I \ 1 1 



6 - 




0.5 1 1.5 2 2.5 

Time 

Figure 6: A measurement sample for the state trajectory in Figure |l] for larger measurement noise 

K = 2.) 
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Marginal conditional probability distribution of at time T=1 . 
0.06 I 1 , 1 1 1 1 



0.05 - 



0.04 - 



Q. 



0.02 - 



0.01 - 








-2 -1.5 -1 -0.5 0.5 1 1.5 2 

Figure 7: Marginal conditional probability density for xi(f) for the larger measurement noise case.. 
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Conditional Mean for state x. 
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50 100 150 200 250 

Figure 8: Conditional mean for state xi(t) computed using 5000 particles for cry = 2. 
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5.2 Example 2 




Figure 10: A sample trajectory for the state model in Equation dA. 

Consider the state model 

dxi{t) = (-X2(t) + cos(xi(t))dt + dvi(t), (5.4) 
dx2{t) = (xi(t) + sin(x2(t)))dt + d\i2{t), 
and the measurement model 

c?yi(ifc) = x?(tfc) + wi(tfc), (5.5) 

dy2{tk) = X2(tfc) + W2(ifc). 
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A measurement process sample (a = 1) 




Figure 11: A measurement sample for the state trajectory in Figure 00 and measurement model 



given by Equation 5.5 



Here d\/\{t) are uncorrelated standard Wiener processes, and Wi{tk) ~ A^(0, 1). A sample 
of the state and measurement processes are shown in Figures and |ll] respectively. The 
discretization time step is 0.01. 

The initial distribution is taken to be a Gaussian with zero mean and variance of 10. 



Figures 12 and |1^ plots the conditional mean for the state. It is seen that the tracking is 
quite good despite the error at the start; the RMS error was found to be 0.54. The interval 
[—6, 6] was uniformly divided into 62 grid points and the extent of the transition probability 
tensor was 2. The 2000 time steps took about 8 minutes. 
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Figure 12: Conditional mean for state (xi(i) computed for the measurement sample of Figure pi] and 
with initial distribution A^(0, 10. 

Next, consider time step of 0.2, i.e., only every twentieth measurement sample is assumed 



given. Figures 15 and 16 show the conditional means of the states. The number of grid points 
is smaller; the grid spacing is chosen to be twice the previous instance. Consequently, the 
computational effort is less, requiring only about 14 seconds. It is noted that the tracking 
performance is very good and the error estimated form the conditional probability density 
using this approximation is reliable. Now the RMS error is found to be 0.69, and only 0.31 if 
the first few errors are ignored. 

In contrast, the error using the SIR-PF (with 2000 particles that took 37 seconds) is 
found to fail with RMS error of 3.68; see Figures |l^ and ^ for typical results. The results for 
increasing the number of particles to 50, 000 (14 minutes execution time) did not improve the 
situation significantly (RMS error of 3.34); it would be better to subdivide the time step into 
several time steps and do the SIR-PF. For the purposes of this paper, it is sufficient to note 
that in this instance a single-step DF algorithm succeeds where the one-step SIR-PF fails. 
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Figure 13: Conditional mean for state (x2(t) computed for the measurement sample of Figure [Til and 
with initial distribution iV(0, 10. 
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Figure 14: A measurement sample for the state trajectory in Figure |lO| and measurement model 
given by Equation x5 with measurement interval of T = 0.20. 
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6. Additional Remarks 

It is remarkable to note that the simplest approximations to the path integral formulae leads 
to very accurate results. Note that the time steps are small, but not infinitesimal. Such time 
step sizes are not unrealistic in real world applications. 

It is particularly noteworthy since it was found that SIR-PF was not a reliable solution 
to the studied problems. Note that the rigorous results for MC type of techniques assume 
that the signal model drifts are bounded. If that is not the CclSG^ clS here, the SIR-PF is not 
guarenteed to work well. In any case, the speed of convergence to the correct solution is not 
specified for a general filtering problem, as emphasized in [18]; PFs need to be "tuned" to 
the problem to get desired level of performance. In fact, for discrete-time filtering problems, 
excellent performance also follows from a well-chosen grid using sparse tensor techniques [19]. 
Clearly, it is not axiomatic that a generic particle filter will lead to significant computation 
savings (or performance) over a well-chosen sparse grid method, at least for smaller dimen- 
sional problems. 

Observe also that the DF path integral filtering formulae have a simple and clear physical 
interpretation. Specifically, when the signal model noise is small the transition probability 
is significant only near trajectories satisfying the noiseless equation. The noise variance 
quantifies the extent to which the state may deviate from the noiseless trajectory. 

The following additional observations can be made on the simplest path integral filtering 
method proposed in this paper: 

1. In universal nonlinear filtering, including path integral filtering discussed here, the stan- 
dard deviation computed from the (accurately computed) conditional probability den- 
sity is a reliable measure of the filter performance. This is not the case for suboptimal 
methods like the EKF. 

2. In the examples studied, only the simplest one-step approximate formulae for the path 
integral expression were applied. There is a large body of work on more accurate one- 
step formulae that could be used to get better results if the formulae used in this paper 
are not accurate enough (see, for instance, [10]) . 

3. Observe that higher accuracy (than the DF approximation) is attained by approximat- 
ing the path integral with a finite-dimensional integral. The most efficient technique 

for evaluating such integrals would be to use Monte Carlo or quasi Monte Carlo meth- 
ods. Another possibility is to use Monte Carlo based techniques for computing path 
integrals [20]. Observe that this is different from particle filtering. 

4. The major source of computational savings following from noting that the transition 
probability is given in terms of an exponential function. This implies that P{t" ^ x"\t' ^ x') 
is non-negligible only in a very small region, or the transition probability density tensor 
is sparse. The sparsity property is crucial for storage and computation speed. 
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5. In the example studied in Section 5.1 the grid spacing was larger than the noise. Since 
the grid must be able to sample the probability density, the effective noise vielbein was 
taken to be a constant (1 in our example) times the grid spacing, i.e., the signal model 
noise term is "inflated". Of course, this means that the result is not as accurate as 
the solution that uses the smaller values for the noise. However, it may still lead to 
acceptable results (as in the first example) at significantly lower computational effort. 

6. Observe that even with the coarse sampling the computed conditional probability den- 
sity is "smooth". It seems apparent that a finer spatial grid spacing (with the same 
temporal grid spacing)will yield essentially the same result (using the DF approxima- 
tion) at significantly higher computational cost. This was observed in the two examples 
studied in this paper. Of course, a multiple time step approximation would be more 
accurate. 

7. Also note that the conditional mean estimation is quite good, i.e., of the order of 
the grid spacing, even for the coarser resolutions. This confirms the view that the 
conditional probability density calculated at grid points approximates very well the 
true value at those grid points (provided the computations are accurate). Alternatively, 
an interpolated version of the fundamental solution at coarser grid is close to the actual 
value. This suggests that a practical way of verifying the validity of the approximation 
is to note if the variation in the statistics with grid spacing, such as the conditional 
mean, is minimal. 

8. It is also noted that the PDE-based methods are considerably more complicated for 
general two- or higher-dimensional problems. Specifically, the non-diagonal diffusion 
matrix case is no harder to tackle using path integral methods than the diagonal case. 
This is in sharp contrast to the PDE approach which for higher-dimensional problems 
is typically based on operator splitting approaches. The operator splitting approaches 
cannot be reliable approximation for the general case. 

9. In this paper, the prediction and correction steps were carried out using a uniform grid. 
It is clear that a much faster approach for the correction part for higher dimensional 
problems would be to use Monte Carlo integration methods. 

10. Observe that the one-step approximation of the path integral can be stored more com- 
pactly. Compact representation of the transition probability density, especially in the 
Ito case where it is of the Gaussian form. Even for the general case, the transition 
probability density from a certain initial point and given time step can be stored in 
terms of a few points with the rest obtainable via interpolation. 

11. Observe that the prediction step computation was sped up considerably by restrict- 
ing calculation only in areas with significant conditional probability mass (or more 
accurately, in the union of the region of significant probability mass of p{y{tk)\x) and 
p{x\Y{tk-i))). 
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12. It is noted that when the DF approximation is used with larger time steps, a coarser 
grid is more appropriate, which requires far fewer computations. Thus, a quasi-real- 
time implementation could use the coarse-grid approximation with larger time steps to 
identify local regions where the conditional pdf is significant so that a more accurate 
computation can then be carried out. 

13. In this paper, it has been assumed that the diffusion matrix is invertible. Often this 

arises when it is a matter of defining a state variable as a time derivative of the other 
{dxi{t) = X2(t)dt). It is plausible that the addition of a small, positive perturbation to 
the diffusion matrix (so that the perturbed diffusion matrix is invertible) will not lead 
to large errors. 

14. When the step size is too large, the approximation will not be adequate. However, unlike 
some PDE discretization schemes, the degradation in performance is more graceful. 
For instance, positivity is always maintained since the transition probability density is 
manifestly positive. It is also significant to note that in physics path integral methods 
are used to compute quantities where t' — oo and t" — +oo (sec, for instance, [21]). 
This is not possible by simple discretization of the corresponding PDE due to time step 
restrictions (note that implicit schemes are not as accurate). 

15. For the multiplicative noise case, the choice of s 7^ leads to a more complicated form of 
the Lagrangian. The accuracy of the one-step approximation depends on s in addition 
to r and will be model-dependent. 

16. Note that, unlike the result of S-T. Yau and Stephen Yau in [13], there is no rigorous 
bound on errors obtained for the Dirac-Feynman path integral formulae studied in the 
examples. It is known rigorously for a large class of problems that the continuum path 
integral formula converges to the correct solution [22] . 

7. Conclusion 

In this paper, a new approach to solving the continuous-discrete filtering problem is presented. 
It is based on the Feynman path integral, which has been spectacularly successful in many 
areas of theoretical physics. The application of path integral methods to quantum field 
theory has also given striking insights to large areas of pure mathematics. The path integral 
methods has been shown offer deep insight into the solution of the continuous-discrete filtering 
problem that has potentially useful practical implications. In particular, it is demonstrated 
via non-trivial examples that the simplest approximations suggested by the path integral 
formulation can yield a very accurate solution of the filtering problem. The proposed Dirac- 
Feynman path integral filtering algorithm is very simple and easy to implement and practical 
for modest size problems, such as those arising in target tracking applications. Such formulae 
are also especially suitable from a real-time implementation point of view since it enables us 
to focus computation only on domains of significant probability mass. The application of path 
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integral filtering for radar tracking problems, especially those with significant nonlinearity in 
the state model, will be investigated in subsequent papers. In a recent paper [23], it has been 
shown that the Feynman path integral filtering techniques also leads to new insights into the 
general continuous-continuous nonlinear filtering problem. 
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A. Summary of Path Integral Formulas 

A.l Additive Noise 

The additive noise model 

dx{t) = f{x{t),t)dt + e{t)d\/{t), 
is interpreted as the continuum limit of 

Ax(t)=/(xW(t),t)At + e(t)Av(i), 

where 

x('')(t) = x(t - At) + r(x(t) - x(t - At)). 

Observe that any r G [0, 1] leads to the same continuum expression. 

The transition probability density for the additive noise case is given by (see [11]) 



(A.l) 
(A.2) 
(A.3) 



px(t")=x" ( rt" 

P{t",x"\t\x')= [9x{t)]exp[- dtL^'-\t,x, 

Jx(t')=x' \ Jt' 



(A.4) 



rx{t")=x" 

c{t')- 

where the Lagrangian L^'^\t,x,x) is 

L(^\t,x,x) = lj2[^r- n{x^'\t),t)] gT^Ht) [x, - /,(xW(t),t)] +rj2^M'HtU), 

i=l i=l * 

(A.5) 

and gij{t) = Yl'ab=i ^ia{t)Qab{t)ejb{t), and 



1 ^ 
— lim I I 



d'^xit' + ke) 



A/(27re)'*det5(t') N^oo j^^^ ^(27re)'^ det g{t' + ke) 
This formal path integral expression is defined as the continuum limit of 



(A.6) 



1 f ^ 

^ i27re)^ det g{t'^ J 



d^'xit' + ke) 



V^(27re)" det g{t' + ke) 



e^p{-SP{t",t')), (A.7) 



- 31 - 



where the discretized action Se \t",t') is defined as 

N+l 



2e ^ 



k=l 



X] i^ii^k) - Xi{tk-i) - efi{x^''\tk),tk))gij^{xj{tk) - Xj{tk-i + efj{x^''\tk),tk))) 

(a: 

7V+1 



k=i 
and where 



x'^'^Ktk) = x{tk-i) + r{x{tk) - x{tk-i)). 



(A. 



A. 2 Multiplicative Noise 

Consider the evolution of the stochastic process in the time interval \t' ,t"]. Divide the time 
interval into + 1 equi-spaced points and define ehy t' + {N + l)e = t", or e = jjjpi - Then, 
in discrete-time, the most general discretization of the Langevin equation is 



Xi(tp) - Xi(tp_i) = efi{x^'''{tp),tp) + Ciai^^"' {tp),tp){Vaitp) - Vaitp-l)), 

where p = 1, 2, . . . , A^ + 1, < r, s < 1, and 



(A.IO) 



= Xi{tp-i) + r(xi(tp) - Xi(tp_i)), 



^i^Hh) = Xi(ip-i) + sAxj(t 



p;5 



(A.ll) 



Xi{tp-l) + s{Xi{tp) - Xi(tp_i)). 



In this section, the Einstein summation convention is adopted, i.e., all repeated indices are 



dx 



^ is written as 



(r) 



assumed to be summed over, so that Ciadva = Yla=i ^iadva- Also, 

Note that the change in Equation [A.l[l| when fi{x^^\tp) , tp) and eia{'x^^\tp),tp) are re- 
placed with /j(x(''")(fp), ip_i) and eja(x(*)(ip), ip_i) is of O(e^) and 0(e^/^) respectively. Hence, 
it may be ignored in the continuum limit as it is of order higher than 0(e). 

In summary, there are infinitely many possible discretizations parametrized by two reals 
r,s £ [0,1]. In the continuum limit, i.e., e — > 0, observe that the stochastic process depends 
on s, but not on r. When s = 0, the limiting equation is said to be interpreted as an Ito 
SDE, while when s = ^ , the equation is said to be interpreted in the Stratanovich sense. 

In [12] it is shown that for the general multiplicative noise case 



Pit",x"\t',x') 
where the action S^'^''^^ is given by 



xit")=x" 

[^x(t)]exp(-59"'^)) , 

x{t')=x' ^ ' 



(A.12) 



dt 



o2 r 



(A.13) 
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where 

Pe 

gij= eia{x^'\t)^t)Qab{t)ejbix^'\t),t), (A.14) 

a,b=l 

and 

^(M^ 1^ j2eUx^^\t),t)Qa,{t)^ix^^\t),t) \ , (A.15) 

y a,b=li'=l ^^i' / 

and the probability measure [^x(i)] is given by 

1 

V(27re)'» detg{x('){t"),t") 

The discretized expression for the general case is complicated but can be written down from 
these results in a straightforward manner. 
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